Algorithmic Explorations of Unimolecular and Bimolecular Reaction Spaces

Abstract Algorithmic reaction exploration based on transition state searches has already made inroads into many niche applications, but its potential as a general‐purpose tool is still largely unrealized. Computational cost and the absence of benchmark problems involving larger molecules remain obstacles to further progress. Here an ultra‐low cost exploration algorithm is implemented and used to explore the reactivity of unimolecular and bimolecular reactants, comprising a total of 581 reactions involving 51 distinct reactants. The algorithm discovers all established reaction pathways, where such comparisons are possible, while also revealing a much richer reactivity landscape, including lower barrier reaction pathways and a strong dependence of reaction conformation in the apparent barriers of the reported reactions. The diversity of these benchmarks illustrate that reaction exploration algorithms are approaching general‐purpose capability.


Elementary Reaction Enumeration
Graph-based ERSs are used by YARP to enumerate potential products for a given set of reactants as a prerequisite for TS searches. These ERSs take the form of rules for "break m bonds" (bm), "form n bonds" (fn), and "transfer x electrons", akin to standard arrow pushing. The differences between the Lewis structures of any reactant and product are expressible in terms of such an ERS (e.g., an S N 2 reaction is a b1f1 reaction). Graph-based ERSs can be considered a generalization of typical named reactions, since the latter are a constrained subset of the former (e.g., a Diels-Alder reaction is a specific occurrence of a b3f3 reaction).
To develop the current benchmarks, we make use of the fact that graph-based ERSs define reaction subspaces that can be comprehensively characterized for relatively large systems.
For example, applying all b1f1 reactions to a given nucleophile and molecule would automatically include the set of all possible S N 2 products. This aspect of graph-based ERSs is used here to comprehensively explore the b2f2 reaction subspaces of several neutral closed-shell reactants. For these species, the b2f2 ERS is the simplest reaction rule that yields nontrivial products. 1 As shown below, the TSs of many physically relevant b3f3 reactions have ii sufficient configurational overlap with the TSs of b2f2 reactions that they are still discovered during the TS search, even if they are not directly enumerated. For reference, a comparison of the cost and coverage of b2f2 and b3f3 enumeration schemes is provided in Figure S6.
For the relatively large l-glucose system, a conditional b3f3 (cb3f3) ERS has also been used

Pre-pruning the potential products
Even strictly following the simplest b2f2 ERS, the number of possible products increases as O(N 2 ) where N refers to the number of unique bonds in the reactants. To reduce the search space, the principle of thermodynamic control can be applied to prune the potential products. 2 Since the activation energy is bound from below by the heat of reaction, relatively endothermic reactions may also be safely excluded from a search based on kinetic factors.
Within YARP v2.0 an optional threshold on the enthalpies of reaction (∆H ⊖ r ) has been incorporated to down-select reactions before preforming costly TS characterizations (Fig. 1b).
Heat of reaction calculations were performed using TCIT, 3,4 an inexpensive and extensible component theory parameterized exclusively from quantum chemistry data.
iii Finding an optimal value of the ∆H ⊖ r threshold is possible, but typically impractical. For example, this threshold could be determined in an iterative manner by incrementally raising it and characterizing the TSs (vide infra) until all discovered reactions had barriers less than or equal to the threshold, thus exhausting all kinetically relevant products. In practice, setting the threshold this high excludes essentially no channels making the filter useless.
Here, the ∆H ⊖ r threshold was set to be 80 kJ/mol for unimolecular degradation network problems. This value was selected based on comparisons with unfiltered results for the KHP benchmark ( Fig. S3) and the observation of no loss of kinetically relevant channels. At room temperature, 80 kJ/mol endothermic products would have sub parts per trillion equilibrium concentrations, but the possibility of their kinetic relevance as intermediates cannot be ruled out a priori. The relevant threshold value obviously depends on the reactant. For a very stable reactant, like methyl butanoate (MB), there are relatively few exothermic single-step reactions and thus a higher threshold is appropriate. For the bimolecular reactants and MB a much higher ∆H ⊖ r threshold of 200 kJ/mol was used to avoid throwing out potentially relevant reactions. For the second step of exploring the MB network, the lower 80 kJ/mol threshold was reused due to the larger number of exothermic single-step reactions after the initial endothermic step.

Geometry Initialization
The reactions generated by graph-based ERS enumeration must be converted to geometries as a prerequisite for the double-ended TS search. The manner in which this is done is extremely consequential for localizing meaningful TSs, especially for molecules with significant conformational freedom. 5 YARP v2.0 supports two algorithms for generating reaction geometries: the joint-optimization protocol published in the original YARP paper 1 and the conformational sampling plus joint-optimization algorithm that was recently developed by our group. 5 The latter algorithm uses CREST 6 to generate reactants and/or product conformers that are then jointly-optimized and ranked by a random forest classification model iv (See Reference 5 for additional details). Both algorithms treat unimolecular and bimolecular reactions on an equal footing. The only distinction is that when using the CREST-based algorithm, multimolecular conformers were discarded if they exhibited a centroid−centroid intermolecular separation greater than twice the sum of the molecular radii. Based on previous benchmarks, these algorithms provide high quality reactive configurations for both unimolecular and bimolecular systems in an automated fashion. Here, both schemes are retained to generate at most 1 + N reaction conformations for each attempted reaction (one from the joint-optimization and N from reactant conformational sampling). N is taken as 3 in all case studies, and only those reaction conformations that have an intended score greater than 0.5 predicted by the pretrained random forest classifier were retained for double-ended searches. 5

Transition State Localization
TSs are characterized in three steps by YARP. First, a double-ended search is performed to estimate the saddle-point. Here, the growing string algorithm is used for this step, as implemented in the pyGSM package maintained by the Zimmerman group. 7 The highest energy node from the GSM search is used as an initial guess for the TS geometry. Second, a TS refinement is performed starting from the initial guess obtained from the double-ended search. Here Berney optimization 8 is used for this step, as implemented in Gaussian 16. 9 Third, the refined TS is characterized by an intrinsic reaction coordinate (IRC) calculation.
The IRC calculation consists of performing gradient descent optimizations along the imaginary frequency mode of the TS, such that the reactant and product corresponding to the TS can be identified. Those TSs where the IRC structures match the enumerated reactant and product (i.e., those that were used to initiate the double-ended search) are referred to as "intended", otherwise the TS is labeled as "untintended". YARP v2.0 introduces two improvements in how these steps are performed and how data are utilized. The first improvement is that the TS localization steps are entirely performed v at a low-level of theory before down-selecting the reactions that merit high-level characterization (Fig. 1c). The low-level characterizations are used to eliminate TSs that are degenerate (e.g., from different reaction conformations) or whose IRC does not include the nominal reactant. In other words, if a TS is unintended but still matches with the reactant, it is retained for high-level TS characterization. The rationale for this is that an unintended TS involving the reactant is still connected to the reaction network that is being explored and therefore of possible relevance. This is the second improvement introduced by YARP v2.0, since previously all unintended TSs were discarded during network exploration. For instance, a TS discovered as part of a b2f2 enumerated reaction may actually correspond to a b3f3 TS, even if the b3f3 ERS was not used directly in enumeration. All down-selected TSs are then refined via high-level Berney optimization using the low-level Berney-optimized TS as an initial guess (Fig. 1d). Here, GFN2-xTB 10 was used as the reference low-level theory (ML potentials and low-level DFT functionals are also available options). For the L-Glucose system, B3LYP-D3/TZVP density functional theory (DFT) was used as the highlevel method for comparison with previous work. For all other systems, M05-2x/def2-SVP with D3 dispersion correction 11 was used as the high-level method based on its balance of cost and accuracy and recommendation over B3LYP and M06-2X in a recent benchmark. 12 To provide additional data on the effect of functional choice, up to of the ten lowest barrier reactions involving each reactant and intermediate were refined (both geometries and energies) at the range-separated ωB97XD/def2-TZVP 13 level. The deviations between the two functionals are shown in Figure S2, and the more accurate range-separated values are used in the reaction diagrams reported in the main text. After obtaining the high-level Berney-optimized TSs, there is the question of whether to perform a high-level IRC calculation. The IRC calculation is typically more expensive than the Berney optimization. Since the reactions have already been filtered by the low-level IRC calculation, just adopting the low-level reaction assignments would save computational cost. However, this would risk misclassifications in cases where the high-level TS underwent vi a relatively large geometry rearrangement or where the low-level was inaccurate. In the current study we have performed high-level IRC calculations for all of the unimolecular reaction networks and bimolecular benchmarks. To investigate the feasibility of selectively performing high-level IRC calculations, we have used the high-level IRC data from these benchmarks to train an ML-based XGBoost classifier 14 to estimate the probability that a given high-level TS is intended. The performance and training details for this classifier are provided in the SI section 8. As a proof-of-principle, this classifier is used in the L-Glucose study to down-select TSs predicted to be unintended for IRC calculations.  which indicates the similarity between the M052x-D3/def2-SVP and ωB97XD/def2-TZVP level potential energy surfaces. These similarities suggest that there is marginal benefit for using the range-separated functional in the current cases. For example, the differences in activation energy between the functionals is typically much smaller than the range of activation energies observed for distinct conformers. Since conformational sampling is neglected in many automated exploration studies, we note that spending time on higher level functionals or multireference calculations without conformational sampling misrepresents the potential sources of error affecting the discovered TSs.

Activation Energies Calculated at Different Levels of Theory
To estimate the errors associated with using DFT rather than a wavefunction based method, the DFT activation energies were also compared with CCSD(T)-F12/cc-pVDZ-F12 To establish the effects of the down-selection procedures introduced in YARP v2.0 (i.e., the heat of reaction filter and treatment of unintended channels discussed in the main text), we have compared differences with YARP v1.0 in the products that were discovered in the first step of KHP reaction exploration (Fig. S3). In total, YARP v2.0 excluded four intended products that were discovered by YARP v1.0, while discovering four new intended products and seven unintended products. The bottom three missing products were discarded after the reaction enumeration step since they exceed the enthalpy of reaction threshold (80 kJ/mol, around 20 kcal/mol). Based on the YARP v1.0 results, they correspond to kinetically irrelevant reactions, with activation energies more than 40 kcal/mol larger than the most kinetically favorable reaction. The TS search was performed for the first missing product but failed to locate an intended TS with YARP v2.0. This is caused by the additional GFN2-xTB level Berney optimization that is performed in YARP v2.0 (i.e., in v1.0 only the growingstring calculation was performed at the low-level). Nevertheless, this missing product also connected by a reaction with a barrier 40 kcal/mol larger than the most kinetically favorable reaction.
On the other hand, four newly discovered products indicate that YARP v2.0 increased the intended reaction coverage. This is mainly due to conformational sampling in YARP v2.0, albeit in this case the resulting intended products are all relatively high barrier. Many unintended channels were also discovered, including two of potential kinetic relevance (Fig.   S3). x

Missing products Newly discovered intended products
Newly discovered unintended products

Reaction Filtering Examples
Two examples are provided to illustrate how YARP v2.0 down-selects the potential reactions and reaction conformations (Fig. S4). The first example, propylene carbonate (PC), is the reactant with the highest down-selection rate among reactants with more than 5 DFT level products. The second example, oxetane-2-yl hydroperoxide (OH), is an intermediate in the KHP network that exhibits a near-average down-selection rate.
In the first step, TCIT 3,4 is applied to predict the enthalpy of formation (∆H f ) of the reactant and the enumerated b2f2 products. With the enthalpy of reaction (∆H ⊖ r ) threshold of 80 kJ/mol, 20 and 7 products are excluded in the PC and OH systems, respectively.
Reaction conformational sampling (RCS) is then applied to the remaining products. In this step, 2 and 3 products are excluded for PC and OH, respectively, because no reaction conformations were generated with sufficiently high intended probability for these products.
In the third step, GFN2-xTB level growing string method (GSM) and Berny TS optimization are performed and the products with no successful reaction channel (i.e. either GSM or TS optimization do not converge) are excluded. In both systems, all products pass this step.
At this stage if reaction conformations result in a same TS (i.e. showing a single point energy difference smaller than 10 −4 Hartree) they will be treated as duplicates and only one will be kept. The last down-selection step is the GFN2-xTB level intrinsic reaction coordinate (IRC) calculation. If the low-level IRC analysis is unintended and the reactant is not included in either of the IRC endpoints then the reaction will be excluded from highlevel characterization. For PC and OH, 4 and 10 products, respectively, were excluded since none of their IRCs satisfied the retention criteria. Note that since multiple reaction conformations are used there are possibly multiple IRCs associated with a product. In total, this down-selection process excludes 72.2% and 57.1% enumerated products from highlevel characterization for PC and OH, respectively, which helps to dramatically decrease the computational cost. Section 5 discusses whether this process leaves out important reactions xii due to the inaccuracy of the GFN2-xTB calculations.
Activation energy threshold. Besides the above mentioned down-selection procedures, an activation energy threshold (∆G † thresh ) can be set to avoid exploring the reactions with relatively high barriers. For example, when studying reaction conditions at 100 • C, only reactions with ∆G † ≤ 45 kcal/mol exhibit appreciable kinetics on lab timescales. 21 In YARP this filter is applied after the low-level TS calculation and for screening high-level unintended TSs. A 5 kcal/mol energy window is added when applying the low-level filter to acknowledge the relative inaccuracy of the low-level method (GFN2-xTB, in this case). Thus, setting a ∆G † thresh of 45 kcal/mol will exclude reactions with ∆G † > 50 kcal/mol computed at GFN2-xTB level and unintended reactions with ∆G † > 45 kcal/mol computed at the DFT level.
This filter was only used in the present study for the L-glucose pyrolysis benchmark. Table S1: Statistics of additional test with B3LYP/6-31+G* as low cost method. The "reactions attempted" column corresponds to the number of TSs that were down-selected for eventual high-level characterization in each scenario. The "I/U" column reports the number of intended and retained unintended products found after the high-level IRC step. To investigate GFN2-xTB down-selection as a potential source of error, six reactants with large low-level down-selection rate (step 3 and 4 in the process illustrated in Fig. S4) were reevaluated using B3LYP/6-31+G* as a low-level method (Table S1). The keyword "I/U" refers to the number of intended and retained unintended products and "xTB" ("B3LYP") refers to the YARP v2.0 results using xTB (B3LYP/6-31G*) as the low-level method. For xiii four out of the six systems, using xTB as the low-level method results in finding the same (or more) intended channels as using B3LYP/6-31+G*. For 1,3-dioxan-2-one (O=C1OCCCO1) and trimethylphosphane plus oxirane (CP(C)C.O1CC1), using xTB as a low-level method misses one intended channel. However, these missed reactions are relatively high activation energy and are not considered important. Using B3LYP/6-31+G* as a low-level method does result in two retained unintended products, both from the PC network, that are more kinetically favorable than the reactions explored by xTB for these two reactants (Fig. S5).  Figure S5: Two kinetically favorable reactions that are discovered as retained unintended channels using B3LYP/6-31+G* as a low-level method that are missed when using xTB.

Using DFT as a Low-Level Method with YARP
Although the two methods perform similarly as low-level chemistries, they exhibit dra- So the computational cost of B3LYP/6-31+G* level of theory is around 90 times the cost of xTB, which roughly matches with our previous observations for YARP v1.0. 1

Comparing b2f2 and b3f3 Product Enumeration
The selection of the elementary reaction step (ERS) is essential in YARP, since it determines the reaction coverage and computational cost. This topic was discussed extensively in the original YARP publication, in particular as regards under what conditions it is necessary to include higher order ERSs like b3f3 for neutral closed-shell reactants. 1 In that study we found that remarkably little chemistry was missed when excluding b3f3, but reactions like Diels- To compare the coverage and cost of b2f2 and b3f3 exploration with inclusion of unintended reactions, the first step of reaction exploration for PC was reperformed with the b3f3 ERS and the number of products, intended rate, number of DFT gradient calls (GC) and gradient calls per intended product (GCPI) were compared with only using b2f2 enumeration (Fig. S6). The numbers of b2f2 products (10) and b3f3 products (12) are similar (Fig.  S6a); however, the number of b3f3 products grows much faster than b2f2 products as system sizes increase. The intended rate of b2f2 exploration is 70%, which is over twice the intended rate of b3f3 exploration (33.3%, Fig. S6b). This ratio reflects the difficulty of converging b3f3 TSs (and potentially their relative scarcity on the chemical potential energy surface) compared with b2f2 TSs. As for the computational cost, b2f2 exploration required 69 GCs, while b3f3 required 137 GCs, which is twice the cost of b2f2 exploration alone. The larger number of GCs is indicative of the b3f3 TSs undergoing larger structural changes from the low-level estimation on average compared with b2f2, which is also indirectly related to the low intended b3f3 rate. The GCPI of b3f3 is approximately three times the GCPI of b2f2 exploration, which summarizes the low efficiency of b3f3 searches compared with b2f2.
Despite the additional costs of b3f3, the question remains: what products are missing from the b2f2 exploration that can only be found using b3f3 enumeration? Only one intended reaction (I) and two unintended reactions (II, III) were discovered by b3f3 exploration that were not characterized during b2f2 exploration (Fig. S6e). The two unintended reactions are actually b2f2 reactions that were enumerated by the b2f2 search but were excluded from exploration based on the ∆H ⊖ r threshold. Thus, only one additional reaction is authentically discovered by b3f3 exploration, and in this case is high barrier and kinetically irrelevant.
Besides this testing demonstration, we note that the Diels-Alder reaction, the entire Korcek mechanism, as well as other b1f1,b2f3,b3f3 and even b4f4 reactions are reported in the main text and were discovered as retained unintended channels using only b2f2 enumeration to guide the search. Thus, we recommend b2f2 or conditional b3f3 searches as typically striking a balance between reaction coverage and computational cost for relatively simple organic reaction exploration.

Analysis of Outliers
This section discusses the reactions that were outliers with respect to the number of gradient calls and range of ∆G † from the unimolecular and bimolecular reaction explorations. The xvi reactions with a number of GC greater than 30 or a range of ∆G † larger than 15 kcal/mol were regarded as outliers and were analyzed in detail. We have previously shown that a simple random forest classifier can be effectively trained using general geometric descriptors for down-selecting reaction conformations. 14 Here have followed a similar approach, but have used features based on the imaginary normal mode of the TS (which gives a sense of direction in configuration space with respect to the reactant and product geometries) 22,23 in addition to geometric factors. As a prerequisite to calculating the features, the distance changes (DCs) associated with displacement along the imaginary normal mode were calculated between each pair of nearby atoms. Atoms were defined as near each other if they were within twice their hard shell radii (as estimated from tabulated atomic xix radii from the Universal Force Field 24 ). The distance change was defined as the difference between the separations of each atom pair after a positive and negative displacement along the imaginary mode (transition state structure plus/minus half of the imaginary frequency mode, Fig. S9). The atomic separations were calculated for all nearby pairs in each displaced geometry, then the difference between the separations in each geometry was calculated as each DC.
Two input features were calculated as the average (IDC mean ) and minimum (IDC min ) value of the DCs for active bonds (bonds that break/form in the target reaction, denoted as intended distance change, IDC). A third feature was based on the "unexpected" distance changes (UDC) associated with displacement along the imaginary mode, which was calculated from the DCs of inactive bonds (i.e., bonds not undergoing formation or breakage in the reaction) that were larger than IDC min . All UDCs were summed up as UDC sum to describe the unintended motions with respect to the target reaction. Finally, a geometry-based feature, S max , was calculated as the maximum separation between each active bond in the transition state geometry. In the example of Figure Four additional features were developed to describe the structural changes occurring between the low-level TS geometry and the high-level TS for use in featurizing the M2 model. The root-mean-squared displacements (RMSD) of the whole structure and of the atoms involved in the reaction (i.e., involved in bonds that break or form) were calculated between the high-level and low-level TSs as two additional features to represent the structural change. The summation of bond length changes between the high-level and low-level TSs xx was also included as a feature to highlight potential bond changes. Finally, the low-level IRC classification (intended or unintended) served as an eighth indicator for the M2 model. In contrast to the first four features, the latter four are specific to differences between specific low-level and high-level models and are thus less transferable. As shown below, there is a marginal performance difference between these models. models. The whole dataset was composed of 596 intended TSs (true) and 469 unintended TSs (false) and a random 80:20 training:testing split was used. A grid-search and 5-fold crossvalidation using the training data was used to determine optimal values for the learning rate (lr), maximum decision tree depth (max_depth), minimum sum of instance weight needed in a child (min_child_weight) and minimum loss reduction required to make a further partition (gamma). After the same tuning procedure, the performance of the M1 and M2 model on the same testing set was provided in Table S2, along with the performance of an earlier model trained by Grambow et al. 22 that solely uses the largest bond length change in the imaginary frequency mode for comparison. The comparison shows both M1 and M2 model perform better than the Grambow model, which misclassifies 22% of intended channels as unintended. Meanwhile, the addition of the latter four indicators results in only a 0.04 increase in "recall", which implies that the general M2 model is sufficient for characterizing xxi the transition states.  Figure S10: Other products discovered during the one-step exploration using L-glucose as a reactant. These 15 products have higher activation energies than the low-barrier pathways highlighted in the main text.

Additional Figures and Tables
xxii